Mining tree-query associations in graphs

نویسندگان

  • Eveline Hoekx
  • Jan Van den Bussche
چکیده

New applications of data mining, such as in biology, bioinformatics, or sociology, are faced with large datasets structured as graphs. We introduce a novel class of tree-shaped patterns called tree queries, and present algorithms for mining tree queries and tree-query associations in a large data graph. Novel about our class of patterns is that they can contain constants, and can contain existential nodes which are not counted when determining the number of occurrences of the pattern in the data graph. Our algorithms have a number of provable optimality properties, which are based on the theory of conjunctive database queries. We propose a practical, database-oriented implementation in SQL, and show that the approach works in practice through experiments on data about food webs, protein interactions, and citation analysis.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graph Indexing: Tree + Delta >= Graph

Recent scientific and technological advances have witnessed an abundance of structural patterns modeled as graphs. As a result, it is of special interest to process graph containment queries effectively on large graph databases. Given a graph database G, and a query graph q, the graph containment query is to retrieve all graphs in G which contain q as subgraph(s). Due to the vast number of grap...

متن کامل

Scalable Evaluation of k-NN Queries on Large Uncertain Graphs

Large graphs are prevalent in social networks, traffic networks, and biology. These graphs are often inexact. For example, in a friendship network, an edge between two nodesu andv indicates that users u and v have a close relationship. This edge may only exist with a probability. To model such information, the uncertain graph model has been proposed, in which each edge e is augmented with a pro...

متن کامل

Kernel-based Similarity Search in Massive Graph Databases with Wavelet Trees

Similarity search in databases of labeled graphs is a fundamental task in managing graph data such as XML, chemical compounds and social networks. Typically, a graph is decomposed to a set of substructures (e.g., paths, trees and subgraphs) and a similarity measure is defined via the number of common substructures. Using the representation, graphs can be stored in a document database by regardi...

متن کامل

Towards Memory-Efficient Answering of Tree-Shaped SPARQL Queries using GPUs

We present an idea of efficient query answering over an RDF dataset employing a consumer-grade graphic card for an efficient computation. We consider tree-shaped SPARQL queries and static datasets, to facilitate data mining over RDF graphs in warehouse-like setups. Reasons to see the poster: a) presentation of the approach with examples; b) possibility of discussion about the implementation det...

متن کامل

Analyzing SQL Query Logs using Multi-Relational Graphs

Computer Science 6 (Data Management), FAU Erlangen-Nürnberg {andreas.wahl|richard.lenz}@fau.de Analytical SQL queries are a valuable source of information. They contain expert knowledge that cannot be inferred from schemas or content alone. Consider, for example, data lake scenarios, where relational and semi-structured data sources are combined in a single storage and processing environment. D...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1008.2626  شماره 

صفحات  -

تاریخ انتشار 2010